Pose Detection¶
In this training, we are going to try to detect the position of a person's arms, hands, legs, and feet, from an input image. This is called pose detection. We use a model called the "OpenPose" model, which can detect up to 10 people in an image.
Glossary¶
| Term | Description |
|---|---|
| AI Model | An artificial intelligence model is a computer program that is trained on a lot of data and usually performs calculations on matrices. |
| Array | Another word for list. Multiple variables or items stored under one variable name. |
| DPU | Data Processing Unit |
| FPGA | Field Programmable Gate Array; a specialized device that allows you to run your model faster |
| Matrix | A table of numbers |
| OpenPose | This is the pose detection model we are using! |
| Pose Detection | Understanding the pose of a person from an image, such as standing, sitting, etc; easy for humans, hard for computers |
| Tensor | A multi-dimensional table of numbers. A matrix is a (two-dimensional) tensor. |
1. Hardware Setup¶
For this training you will need:
- KRIA SoM
- Webcam
Connect the webcam now, if it is not already connected.
2. Imports¶
Let's start by loading in all the functionality we are going to need later. It is considered good style to put all of your imports at the beginning, so that's what we do. The comments explain what we use each import for.
%%bash
pip install scikit-image
import os # used to execute bash commands to download and uncompress stuff
import cv2 # OpenCV image processing library
import time
import numpy as np # For math
import matplotlib.pyplot as plt # For plotting stuff
from pynq_dpu import DpuOverlay # Overlay for the FPGA
import IPython.display # For displaying a live feed of video
import matplotlib.patches as patches
from skimage.metrics import structural_similarity as ssim
3. FPGA Overlays¶
We run the model on specialized hardware that will allow it to execute faster than on a regular processor. The fastest option would be if we could build custom hardware -- a specialized electronic circuit -- that performs exactly the computations we need. However, soldering your own circuits is hard and takes a lot of time. Also, once such a specialized circuit is built, it can only be used for one thing -- the specific thing it was built for.
FPGAs are the next best thing. They are like circuits, but instead of having to solder them physically, the different electrical connections can be "programmed". FPGAs consist of a large number of gates that we can use to route electricity (which we use to encode data) different ways. This is different from "normal" processor programming: Regular processors read a stream of instructions (our program) and execute those instructions. On FPGAs, we configure a circuit once ahead of time, and can then use it to perform the function it was designed to do multiple times very quickly. The KRIA SoM comes with an FPGA.
Engineering these FPGA circuits is hard. For this reason, we will use a circuit somebody else already built for us! This is called an "overlay". The circuit we will use allows us to do lightning-fast data processing. This is exactly what we need to perform machine learning inference. Let us first load this circuit. By running the following step, you will configure the electronic circuit on your FPGA as a data processing unit (DPU):
overlay = DpuOverlay("dpu.bit")
Note: If you run into a "Device or resource busy" error, quit all your Jupyter notebook kernels and restart this one. Then, try again. You can quit notebook kernels by clicking Kernel > Shutdown and restart by clicking Kernel > Restart.
4. AI Model Setup¶
How ML models work internally is not super important to us as users of the models. What we need to know is that these models take an input matrix, perform some calculations, and produce an output matrix of potentially a different dimension.
What the meaning of the input and output matrices is depends on the model. In our model, the input matrix represents an image: Each cell in the matrix can be thought of as a pixel -- a red/green/blue color value at that coordinate in the image. The model outputs many matrices, which can be interpreted as probabilities that a limb is present in a certain pixel.
When models are built, they are "trained" on a large set of input data. This simply means that the coefficients of the calculations are optimized. In order to run the model on our FPGA overlay, some additional modifications are required. Someone has already done this for us, so we only have to download the model.
Here, we load the model onto the overlay. There is also some code to download the model, but we won't need that, since we already have the model.
This model expects input of dimensions 368x368.
archive_filename = "openpose_pruned_0_3.tar.gz"
extracted_model_path = "openpose_pruned_0_3.xmodel"
# The following will download the model if it is not already present in the folder:
# model_download_url = "https://www.xilinx.com/bin/public/openDownload?filename=openpose_pruned_0_3-zcu102_zcu104_kv260-r2.5.0.tar.gz"
# os.system("wget -nv -O \"{}\" \"{}\"".format(archive_filename, model_download_url))
# os.system("tar -xvf \"{}\"".format(archive_filename))
overlay.load_model(extracted_model_path)
5. Load an Image¶
To test the model, let's load an image of a person and see if the model can detect the pose.
The model works on matrices of size 368 x 368. So first, we need to get our input into the right format. We load a JPEG, decompress it into its raw pixel values, remove the color information and put each pixel value on a 0 - 1 scale. Since the input image also does not exactly match the 368 x 368 format expected by the model, we crop it slightly. We use the OpenCV library which provides functions to do this necessary processing on the image.
Image: Sydney International Tennis ATP, licensed CC-BY-SA 2.0.
# The test image should have been shipped with this notebook; if not, download it like so:
# img_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/8/83/Sydney_International_Tennis_ATP_%2833040179108%29.jpg/640px-Sydney_International_Tennis_ATP_%2833040179108%29.jpg"
# os.system("wget -nv -O test_image.jpeg {}".format(img_url))
The image shows with distorted colors because CV2 reads as BGR but matplotlib expects RGB. We don't convert because the OpenPose model expects BGR too.
img = cv2.imread("img/test_image.jpeg")
# img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) # Need BGR for OpenPose
height, width = img.shape[0], img.shape[1]
aspect_ratio = width / height
width, height, aspect_ratio, plt.imshow(img)
6. Prepare the DPU¶
Different models have different input and output sizes. Let us see what the expected dimensions are for this model.
dpu = overlay.runner
inputTensors = dpu.get_input_tensors()
outputTensors = dpu.get_output_tensors()
shapeIn = tuple(inputTensors[0].dims)
shapeOut = tuple(outputTensors[0].dims)
outputSize = int(outputTensors[0].get_data_size() / shapeIn[0])
shapeIn, shapeOut, outputSize
7. Adjust the Image¶
As we can see, we need an image that has different dimensions than our image we read in previously. To address this, let's pad the image on top and bottom a little:
xpad = 0
ypad = 50
greyscale_color = 200 # padding color a scale of 0 to 255. 0 being black 255 being white.
canvas = greyscale_color*np.ones((height+2*ypad, width+2*xpad, 3), dtype=img.dtype) #
canvas[ypad:height+ypad, xpad:width+xpad, :] = img
img = canvas
plt.imshow(canvas)
Now, let's resize and crop it to get the corrected dimensions:
new_height = shapeIn[1]
img_resized = cv2.resize(img,
(int(aspect_ratio*new_height), # new width; maintain aspect ratio
new_height)) # new height
print(img.shape)
print(shapeIn)
left_bound = int(img.shape[1]/2 - shapeIn[2]/2) # Half of original width minus half of input width
right_bound = left_bound + shapeIn[2] # left bound plus image width
crop_width = right_bound-left_bound
print(f'Cropped X range: {left_bound}:{right_bound} Width: {crop_width}')
upper_bound = int(img.shape[0]/2 - shapeIn[1]/2) # Half of original height minus half of input height
lower_bound = upper_bound + shapeIn[1] # upper bound plus image width (upper bound is numerically lower than lower bound)
crop_height = lower_bound-upper_bound
print(f'Cropped Y range: {upper_bound}:{lower_bound} Height: {crop_height}')
img_cropped = img[upper_bound:lower_bound, left_bound:right_bound, :]# .reshape((shapeIn[1],shapeIn[2],3))
img_cropped = img_cropped.astype(np.float32) / 255.0 # Make it floats
plt.imshow(img_cropped), img_cropped.shape
Here is a look at what the crop looks like compared to the original photo.
fig, ax = plt.subplots()
# Create a Rectangle patch
rect = patches.Rectangle((left_bound, upper_bound), crop_width, crop_height, linewidth=2, edgecolor='red', facecolor='none')
# Add the rectangle to the plot
ax.add_patch(rect)
plt.imshow(img)
8. Run the Model¶
Now that the model is loaded, the DPU is ready, and the image has been properly prepared, there's nothing stopping us from running the model!
# Prepare an array of zeros with the correct output dimensions; the DPU will fill in the results in here, but we
# have to prepare some space for it.
output_data = [np.zeros(shapeOut, dtype=np.float32, order="C")]
# We will be putting our image into this array:
input_data = [np.zeros(shapeIn, dtype=np.float32, order="C")]
# ... but first, we must adjust the shape of the matrix slightly ...
processed_image = img_cropped
processed_image = processed_image.reshape(shapeIn).astype(np.float32, order="C")
input_data[0] = processed_image
# Now, we can send our input to the model ...
job_id = dpu.execute_async(input_data, output_data)
# ... and wait for it to be done doing its calculation
dpu.wait(job_id)
# Now, the results are in output_data
But did it work? Let's plot the output of the model!
The model will output several matrices. One of those is a heat map that gives the probabilities that a limb is present at a certain coordinate. A value close to 1 will indicate that the model thinks there is a limb in that model. For now, we will just take the max of these matrices to get an iteresting output. This causes us to lose some information, i.e. which limb the model detected, but it gives us something interesting to look at.
plt.imshow(img_cropped)
heatmap = np.squeeze(np.max(output_data[0][0][:][:][:],axis=2))
heatmap = cv2.resize(heatmap, (shapeIn[2], shapeIn[1]))
plt.imshow(heatmap, cmap='hot', interpolation='nearest', alpha=0.4)
Looking good!
9. Put it all in functions¶
Next, to make things a little more interesting, we will try to apply the model to a live video feed. For this, let us first take all the stuff we programmed above, and put it into functions so we can reuse it without having to repeat ourselves.
def preprocess_image(img):
# Convert RGB to BGR
img = cv2.cvtColor(img, cv2.COLOR_RGB2BGR)
# Get dimensions
height, width = img.shape[0], img.shape[1]
aspect_ratio = width / height
width, height, aspect_ratio, # plt.imshow(img)
# Padding
xpad = 40
ypad = 40
canvas = 255*np.ones((height+2*ypad, width+2*xpad, 3), dtype=img.dtype)
canvas[ypad:height+ypad, xpad:width+xpad, :] = img
img = canvas
# Cropping
left_bound = int(img.shape[1]/2 - shapeIn[2]/2) # Half of original width minus half of input width
right_bound = left_bound + shapeIn[2] # left bound plus image width
crop_width = right_bound-left_bound
# print(f'Cropped X range: {left_bound}:{right_bound} Width: {crop_width}')
upper_bound = int(img.shape[0]/2 - shapeIn[1]/2) # Half of original height minus half of input height
lower_bound = upper_bound + shapeIn[1] # upper bound plus image width (upper bound is numerically lower than lower bound)
crop_height = lower_bound-upper_bound
# print(f'Cropped Y range: {upper_bound}:{lower_bound} Height: {crop_height}')
img_cropped = img[upper_bound:lower_bound, left_bound:right_bound, :]
img_cropped = img_cropped.astype(np.float32) / 255.0 # Make it floats
# plt.imshow(img_cropped), img_cropped.shape
return img_cropped
def run_model(processed_image):
# Convert Data type to float 32
processed_image = processed_image.reshape(shapeIn).astype(np.float32, order="C")
# Setup Inputs and Outputs
output_data = [np.zeros(shapeOut, dtype=np.float32, order="C")]
input_data = [np.zeros(shapeIn, dtype=np.float32, order="C")]
input_data[0] = processed_image
# Run Model
job_id = dpu.execute_async(input_data, output_data)
dpu.wait(job_id)
return output_data[0]
10. Getting a live video feed¶
You may have to re-run the first code block if it can't open the camera.
# Setup
capture = cv2.VideoCapture(0) # Get a handle on the camera; if you have multiple cameras, you can use different indices
capture.set(cv2.CAP_PROP_BUFFERSIZE, 1) # Disable buffering
capture.set(3, shapeIn[1]) # set input width
capture.set(4, shapeIn[2]) # set input height
num_frames = 3 # set the number of frames you want to run for.
After initial setup above, let's do the same processing we did above for ten frames in a row. We will show each frame after the model has run.
for i in range(num_frames):
# Capture a photo
success, newimg = capture.read()
if not success:
print("Error reading webcam image.")
break
#Process Image
img_cropped = preprocess_image(newimg)
output = run_model(img_cropped)
heatmap = np.squeeze(np.max(output[0][:][:][:],axis=2))
heatmap = cv2.resize(heatmap, (shapeIn[2], shapeIn[1]))
# Choose what to display
plt.clf() # Clears previous frame
plt.imshow(img_cropped) # Displays processed image
plt.imshow(heatmap, cmap='hot', interpolation='nearest', alpha=0.4) # Displays heatmap over image
# Display video
IPython.display.clear_output(wait=True)
IPython.display.display(plt.gcf())
capture.release()
11. Comparing Heatmaps¶
Before we get started, lets define some functions that we will use! Mainly, we need a function to compare heatmaps, which you can find defined below.
def compare(img1, img2, stride=15, show_comp=False):
# Convert heatmaps to grayscale
try:
img1 = cv2.cvtColor(img1, cv2.COLOR_BGR2GRAY)
except:
print("Image 1 already in grayscale format")
try:
img2 = cv2.cvtColor(img2, cv2.COLOR_BGR2GRAY)
except:
print("Image 2 already in grayscale format")
# Pad template for better convolutional comparison
xpad = int(img2.shape[1]/2)
ypad = int(img2.shape[0]/2)
greyscale_color = 0 # padding color a scale of 0 to 255. 0 being black 255 being white.
canvas1 = greyscale_color*np.ones((img1.shape[0]+2*ypad, img1.shape[1]+2*xpad), dtype=img1.dtype) #
canvas1[ypad:img1.shape[0]+ypad, xpad:img1.shape[1]+xpad] = img1
#plt.imshow(img1)
#plt.show()
#plt.imshow(img2)
#plt.show()
# Create matrix to store comparison data
comp_matrix_x = int(img1.shape[1]/stride)
comp_matrix_y = int(img1.shape[0]/stride)
comparison_matrix = np.zeros((comp_matrix_x, comp_matrix_y), dtype=np.int32)
# Convolve photos:
for x in range(comp_matrix_x):
for y in range(comp_matrix_y):
canvas2 = greyscale_color*np.ones((canvas1.shape[0], canvas1.shape[1]), dtype=canvas1.dtype) # init blank canvas
canvas2[y*stride:img2.shape[0]+y*stride, x*stride:img2.shape[1]+x*stride] = img2 # add img2 at x,y coordinates
#plt.imshow(canvas2)
#plt.show()
#time.sleep(1)
mse = ((canvas1 - canvas2)*2).sum() # get the Mean Square error
comparison_matrix[x][y] = mse # store x,y corrdinates mse in comparison_matrix
# Show comparison matrix. THis is usefull for seeing where in the
if (show_comp):
plt.imshow(comparison_matrix)
plt.show()
# Return minimum error score (lower means a better match)
return comparison_matrix.min()/1000000
Here is a look at a heatmap we are going to be testing:
img_path = '/home/root/jupyter_notebooks/PYNQ_Bootcamp/bootcamp_sessions/PYNQ 503 - Pose Detection/img/'
filename = 'Template.jpg'
test_heatmap = cv2.imread(img_path + filename)
plt.imshow(test_heatmap)
Now lets load the template we will be comparing to:
img_path = '/home/root/jupyter_notebooks/PYNQ_Bootcamp/bootcamp_sessions/PYNQ 503 - Pose Detection/img/'
filename = 'Template1.jpg'
template = cv2.imread(img_path + filename)
plt.imshow(template)
Lets compare the two! The comparison function we defined before returns a low value if the two heatmaps are similar. Ideally, it should return 0 when comparing a photo to itself, but a few changes were made when writing the function so that it runs quickly that make it a little less accurate.
score1 = compare(template, test_heatmap)
print(f'Comaparing the two heatmaps gives a score of: {score1}')
score2 = compare(template, template)
print(f'Comparing the template heatmap with itself gives a score of: {score2}')
Now what if we compare it to a completely different heatmap?
img_path = '/home/root/jupyter_notebooks/PYNQ_Bootcamp/bootcamp_sessions/PYNQ 503 - Pose Detection/img/'
filename = 'Template3.jpg'
test_heatmap2 = cv2.imread(img_path + filename)
plt.imshow(test_heatmap2)
score3 = compare(template, test_heatmap2)
print(f'Comaparing the two different heatmaps gives a score of: {score3}')
Notice how the score was higher when we comparred two heatmaps that don't look alike!
12. Detecting a Pose Using Templates¶
With multiple templates to choose from, we can figure out which pose a person is doing by comparing each and seeing which has the lowest score. Lets load in our potential poses: Power pose, Normal stance, and Wacky!
img_path = '/home/root/jupyter_notebooks/PYNQ_Bootcamp/bootcamp_sessions/PYNQ 503 - Pose Detection/img/'
filename = 'power_pose.jpg'
template_power = cv2.imread(img_path + filename)
plt.imshow(template_power)
plt.show()
img_path = '/home/root/jupyter_notebooks/PYNQ_Bootcamp/bootcamp_sessions/PYNQ 503 - Pose Detection/img/'
filename = 'normal_stance.jpg'
template_normal = cv2.imread(img_path + filename)
plt.imshow(template_normal)
plt.show()
img_path = '/home/root/jupyter_notebooks/PYNQ_Bootcamp/bootcamp_sessions/PYNQ 503 - Pose Detection/img/'
filename = 'wacky.jpg'
template_wacky = cv2.imread(img_path + filename)
plt.imshow(template_wacky)
plt.show()
Lets compare our photo from earlier to these three templates and see which it is closest to:
#Here is the test photo:
plt.imshow(test_heatmap)
score1 = compare(template_power, test_heatmap)
print(f'Comared to Power Stance: {score1}')
score2 = compare(template_normal, test_heatmap)
print(f'Compared to Normal: {score2}')
score3 = compare(template_wacky, test_heatmap)
print(f'Compared to Wacky: {score3}')
Hooray! It looks like our lowest score is the normal stance which is the most similar to our original heatmap.
13. Generating Your Own Templates:¶
Lets generate our own Template!
# Setup
capture = cv2.VideoCapture(0) # Get a handle on the camera; if you have multiple cameras, you can use different indices
capture.set(cv2.CAP_PROP_BUFFERSIZE, 1) # Disable buffering
capture.set(3, shapeIn[1]) # set input width
capture.set(4, shapeIn[2]) # set input height
success, newimg = capture.read()
img_cropped = preprocess_image(newimg)
output = run_model(img_cropped)
heatmap = np.squeeze(np.max(output[0][:][:][:],axis=2))
heatmap = cv2.resize(heatmap, (shapeIn[2], shapeIn[1]))
plt.imshow(img_cropped)
plt.imshow(heatmap, cmap='hot', interpolation='nearest', alpha=0.4) # Displays heatmap over image
img_path = '/home/root/jupyter_notebooks/PYNQ_Bootcamp/bootcamp_sessions/PYNQ 503 - Pose Detection/img/'
filename = 'my_template.jpg'
cv2.imwrite(img_path + filename, 255*heatmap)
# close camera
capture.release()
14. You Made It!¶
Phew that was a long one, wasn't it? This can be a particularly finnickey model to deal with, so don't get too frustrated if you struggle using it. As always, if you get stuck reach out and ask questions!
Dont forget to clean up your environment too!
del dpu
del overlay